428 research outputs found

    Understanding Data Manipulation and How to Leverage it To Improve Generalization

    Get PDF
    Augmentations and other transformations of data, either in the input or latent space, are a critical component of modern machine learning systems. While these techniques are widely used in practice and known to provide improved generalization in many cases, it is still unclear how data manipulation impacts learning and generalization. To take a step toward addressing the problem, this thesis focuses on understanding and leveraging data augmentation and alignment for improving machine learning performance and transfer. In the first part of the thesis, we establish a novel theoretical framework to understand how data augmentation (DA) impacts learning in linear regression and classification tasks. The results demonstrate how the augmented transformed data spectrum plays a key role in characterizing the behavior of different augmentation strategies, especially in the overparameterized regime. The tools developed in this aim provide simple guidelines to build new augmentation strategies and a simple framework for comparing the generalization of different types of DA. In the second part of the thesis, we demonstrate how latent data alignment can be used to tackle the domain transfer problem, where training and testing datasets vary in distribution. Our algorithm builds upon joint clustering and data-matching through optimal transport, and outperforms the pure matching algorithm baselines in both synthetic and real datasets. Extension of the generalization analysis and algorithm design for data augmentation and alignment for nonlinear models such as artificial neural networks and random feature models are discussed. This thesis provides tools and analyses for better data manipulation design, which benefit both supervised and unsupervised learning schemes.Ph.D

    Little String Amplitudes (and the Unreasonable Effectiveness of 6D SYM)

    Get PDF
    We study tree level scattering amplitudes of four massless states in the double scaled little string theory, and compare them to perturbative loop amplitudes in six-dimensional super-Yang-Mills theory. The little string amplitudes are computed from correlators in the cigar coset CFT and in N=2 minimal models. The results are expressed in terms of integrals of conformal blocks and evaluated numerically in the alpha' expansion. We find striking agreements with up to 2-loop scattering amplitudes of massless gluons in 6D SU(k) SYM at a Z_k invariant point on the Coulomb branch. We comment on the issue of UV divergence at higher loop orders in the gauge theory and discuss the implication of our results.Comment: 58 pages, 5 figures, 3 tables, comments added, references adde

    Topological Defect Lines and Renormalization Group Flows in Two Dimensions

    Get PDF
    We consider topological defect lines (TDLs) in two-dimensional conformal field theories. Generalizing and encompassing both global symmetries and Verlinde lines, TDLs together with their attached defect operators provide models of fusion categories without braiding. We study the crossing relations of TDLs, discuss their relation to the 't Hooft anomaly, and use them to constrain renormalization group flows to either conformal critical points or topological quantum field theories (TQFTs). We show that if certain non-invertible TDLs are preserved along a RG flow, then the vacuum cannot be a non-degenerate gapped state. For various massive flows, we determine the infrared TQFTs completely from the consideration of TDLs together with modular invariance.Comment: 101 pages, 63 figures, 2 tables; v3: minor changes, added footnotes and references, published versio
    • …
    corecore